16 research outputs found
Kernel-based independence tests for causal structure learning on functional data
Measurements of systems taken along a continuous functional dimension, such
as time or space, are ubiquitous in many fields, from the physical and
biological sciences to economics and engineering.Such measurements can be
viewed as realisations of an underlying smooth process sampled over the
continuum. However, traditional methods for independence testing and causal
learning are not directly applicable to such data, as they do not take into
account the dependence along the functional dimension. By using specifically
designed kernels, we introduce statistical tests for bivariate, joint, and
conditional independence for functional variables. Our method not only extends
the applicability to functional data of the HSIC and its d-variate version
(d-HSIC), but also allows us to introduce a test for conditional independence
by defining a novel statistic for the CPT based on the HSCIC, with optimised
regularisation strength estimated through an evaluation rejection rate. Our
empirical results of the size and power of these tests on synthetic functional
data show good performance, and we then exemplify their application to several
constraint- and regression-based causal structure learning problems, including
both synthetic examples and real socio-economic data
Kernel-based Joint Independence Tests for Multivariate Stationary and Non-stationary Time Series
Multivariate time series data that capture the temporal evolution of
interconnected systems are ubiquitous in diverse areas. Understanding the
complex relationships and potential dependencies among co-observed variables is
crucial for the accurate statistical modelling and analysis of such systems.
Here, we introduce kernel-based statistical tests of joint independence in
multivariate time series by extending the -variable Hilbert-Schmidt
independence criterion (dHSIC) to encompass both stationary and non-stationary
processes, thus allowing broader real-world applications. By leveraging
resampling techniques tailored for both single- and multiple-realisation time
series, we show how the method robustly uncovers significant higher-order
dependencies in synthetic examples, including frequency mixing data and logic
gates, as well as real-world climate and socioeconomic data. Our method adds to
the mathematical toolbox for the analysis of multivariate time series and can
aid in uncovering high-order interactions in data.Comment: 15 pages, 7 figure
Kernel Two-Sample and Independence Tests for Non-Stationary Random Processes
Two-sample and independence tests with the kernel-based MMD and HSIC have
shown remarkable results on i.i.d. data and stationary random processes.
However, these statistics are not directly applicable to non-stationary random
processes, a prevalent form of data in many scientific disciplines. In this
work, we extend the application of MMD and HSIC to non-stationary settings by
assuming access to independent realisations of the underlying random process.
These realisations - in the form of non-stationary time-series measured on the
same temporal grid - can then be viewed as i.i.d. samples from a multivariate
probability distribution, to which MMD and HSIC can be applied. We further show
how to choose suitable kernels over these high-dimensional spaces by maximising
the estimated test power with respect to the kernel hyper-parameters. In
experiments on synthetic data, we demonstrate superior performance of our
proposed approaches in terms of test power when compared to current
state-of-the-art functional or multivariate two-sample and independence tests.
Finally, we employ our methods on a real socio-economic dataset as an example
application
Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable Development Goals
The United Nations' ambitions to combat climate change and prosper human
development are manifested in the Paris Agreement and the Sustainable
Development Goals (SDGs), respectively. These are inherently inter-linked as
progress towards some of these objectives may accelerate or hinder progress
towards others. We investigate how these two agendas influence each other by
defining networks of 18 nodes, consisting of the 17 SDGs and climate change,
for various groupings of countries. We compute a non-linear measure of
conditional dependence, the partial distance correlation, given any subset of
the remaining 16 variables. These correlations are treated as weights on edges,
and weighted eigenvector centralities are calculated to determine the most
important nodes. We find that SDG 6, clean water and sanitation, and SDG 4,
quality education, are most central across nearly all groupings of countries.
In developing regions, SDG 17, partnerships for the goals, is strongly
connected to the progress of other objectives in the two agendas whilst,
somewhat surprisingly, SDG 8, decent work and economic growth, is not as
important in terms of eigenvector centrality
A community-based validation study of the short-form 36 version 2 philippines (tagalog) in two cities in the philippines
10.1371/journal.pone.0083794PLoS ONE812-POLN
Kernel Two-Sample and Independence Tests for Nonstationary Random Processes
Two-sample and independence tests with the kernel-based mmd and hsic have shown remarkable results on i.i.d. data and stationary random processes. However, these statistics are not directly applicable to nonstationary random processes, a prevalent form of data in many scientific disciplines. In this work, we extend the application of mmd and hsic to nonstationary settings by assuming access to independent realisations of the underlying random process. These realisations—in the form of nonstationary time-series measured on the same temporal grid—can then be viewed as i.i.d. samples from a multivariate probability distribution, to which mmd and hsic can be applied. We further show how to choose suitable kernels over these high-dimensional spaces by maximising the estimated test power with respect to the kernel hyperparameters. In experiments on synthetic data, we demonstrate superior performance of our proposed approaches in terms of test power when compared to current state-of-the-art functional or multivariate two-sample and independence tests. Finally, we employ our methods on a real socioeconomic dataset as an example application